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1 Introduction 

Both in Boolean and algebraic complexity, the permanent has proven to 
be a central problem and showing lower bounds on its complexity has 
become a major challenge. This central position certainly comes, among 
others, from its [JP-completeness [15], its VNP-completeness [14], and from 
Toda's theorem stating that the permanent is as powerful as the whole 
polynomial hierarchy [13]. More recently, it played a role in the celebrated 
and subtle result of Kabanets and Impagliazzo [6]: either NEXP'^^ does 
not have Boolean circuits of polynomial size, or the permanent does not 
have arithmetic circuits of polynomial size. 

However little is known on the circuit complexity of the permanent in 
the general case. Indeed, the best lower bound so far on its circuit size is 
no more than the trivial Q{'n?) (remember that PER„ has 'n? variables). 
Despite this rather dark state of affairs, some progress has been made 
on restricted classes of circuits. For instance, we know lower bounds on 
monotone circuits (such circuits for the permanent must have exponential 
size, see [5,11]), and recently, lower bounds on multilinear circuits were 
obtained (see e.g. [8,9,10]). 

A lot of work has also been done on constant- depth circuits, in which 
gates have unbounded fan-in. This line of research has been quite success- 
ful on Boolean circuits and gave deep insights into circuit complexity: see 
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e.g. [3,12]. However, pushing the hmit of lower bounds beyond constant 
depth for polynomial-size circuits has remained elusive so far. 

Another restriction worth studying is uniformity: circuits are not ar- 
bitrary any more but are required to be described by a Turing machine. 
If this description is very efficient (running in time logartihmic in the 
size of the circuit, we speak of DLOGTIME-uniformity), AUender [1] (see 
also similar results on circuits with modulo gates in [2]) has shown that 
the permanent does not have threshold circuits of constant depth and 
subexponential size. In this paper, we obtain a tradeoff between size and 
depth: instead of subexponential size, we only prove a superpolynomial 
lower bound on the size of the circuits, but now the depth is no more 
constant. More precisely, we show the following theorem. 

Theorem 1. The permanent does not have DLOGTIME-um/oTro 
polynomial- size threshold circuits of depth o{loglogn). 

It seems to be the first superpolynomial lower bound on the size of 
non- constant- depth threshold circuits for the permanent (though a lower 
bound is proved in [10] on multilinear arithmetic circuits of depth 
o(logn/loglogn)). Admittedly, the depth o(loglogn) is still small but 
until now the known techniques were only able to prove lower bounds on 
constant-depth circuits. 

Let us very briefly describe our proof technique. In contrast with [1], 
we do not use the relation between threshold circuits and the counting hi- 
erarchy, which implied to consider only constant-depth circTiits. Also, the 
diagonalization in [1] is a variant on the nondetcrministic time hierarchy 
theorem. Here, we use the usual deterministic time hierarchy theorem as 
an indirect diagonalization : under the assumption that the permanent 
has DLOGTIME-uniform circuits of polynomial size and depth o(log log n), 
we show 

1. the value of a threshold circuit of size s and depth d can be computed 
in time (log s)^''^''^ (Lemma 3 combined with Lemma 1); 

2. every language in E has uniform threshold circuits of size 2*^*^") and 
depth o(logn) (Corollary 4). 

These two points together imply that every language in E can be com- 
puted in subexponential time, a contradiction with the time hierarchy 
theorem. 

Since threshold circuits can simulate arithmetic circuits, we also ob- 
tain a superpolynomial lower bound on the size of uniform arithmetic 
circuits of depth o(loglogn) for the permanent (Corollary 7). 
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Organization of the paper — The next section is devoted to the def- 
inition of the notions in use: circuits (Boolean, threshold, arithmetic), 
uniformity and some complexity classes. Then Section 3 is dedicated to 
the proof of Theorem 1 by showing a series of results along the way sug- 
gested above. 

2 Preliminaries 

The notions we use are very standard but, for completeness, we still recall 
them in this section. 

2.1 Boolean circuits 

A Boolean circuit on n variables is a directed acyclic graph, whose vertices 
are labeled either by a variable among {xi, . . . or by an operation 
among {V, A, -■}. Vertices of indegree^ are called inputs, the others are 
called gates. A gate labeled by -> is required to have indegree 1, whereas 
gates labeled by V or A have indegree 2. A single gate has outdegree 
and is called the output gate. 

The value computed by a vertex is defined recursively: an input Xi has 
for value the value of the variable Xj G {0, 1}. A -> gate g = -ih has for 
value the negation of the value of h. An V gate 5 = /ii V /12 (respectively 
an A gate g = hi A /i2) has for value the disjunction (rcsp. conjunction) 
of the values of hi and /12. The value of the circuit is by definition the 
value of its output gate. 

The size of the circuit is the number of vertices and the depth is the 
length of the longest path from an input vertex to the output gate. 

Remark that in order to recognize a language, one needs not only 
one but a whole family (that is, an infinite sequence) of circuits (C„), as 
explained below. There is also a variant in which gates V and A have un- 
bounded fan-in: this is useful when defining classes of circuits of constant 
depth. 

2.2 Threshold circuits 

A threshold circuit has a similar definition as a Boolean circuit with V and 
A gates of arbitrary fan-in, but another type of gates is allowed: threshold 
gates (also known as majority gates). A threshold gate is also of arbitrary 
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fan- in, and its value is 1 if at least half of its inputs have value 1, and 
otherwise. 

Again, in order to recognize a language, a whole family of circuits 
is needed. Remark that it makes sense to consider families of bounded 
depth threshold circuits since gates are allowed to have arbitrary fan-in. 

2.3 Arithmetic circuits 

An arithmetic circuit is defined similarly as a Boolean circuit but with 
other kinds of gates. It has -|-, — and x gates, all of fan-in 2, and besides 
variables, another input is labeled by the constant 1. The variables are not 
considered to have Boolean values anymore, but instead they are symbolic 
and the circuit computes a polynomial (over the ring Z) in the obvious 
way: the value of the input gate labeled by 1 is the constant polynomial 
1, the value of an input gate labeled by Xi is the polynomial Xi, the value 
of a + gate (respectively — gate, X gate) is the sum (resp. difference, 
product) of the values of its inputs. 

An arithmetic circuit C with n input gates computes a multivariate 
polynomial over Z with n variables. Circuit families (C„) are used to com- 
pute families of polynomial. The permanent family (also called permanent 
for short) is the family (PER„) of polynomials defined as follows: 



where the sum is taken over all the permutations cr of {1, n}. The 
variables be viewed as the coefHcients of an n x n matrix, allowing 

us to speak of the permanent of a matrix. 

2.4 Uniformity 

Circuits, be they Boolean, threshold or arithmetic, are finite objects easily 
encoded in binary (e.g. by the list of their vertices and edges). Hence they 
can be handled by Turing machines. 

As already mentioned, we are interested in sequences (C„) of circuits 
in order to recognize languages. In whole generality, no assumption is 
made on the structure of these circuits: in particular, the Boolean encod- 
ings of the circuits of a family may be uncomputable. However, if a single 
Turing machine is able to produce the Boolean encoding of all the cir- 
cuits of the family, then we speak of uniformity. The degree of uniformity 
depends on the ressources needed by the machine. 
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A family of circuits (C„) is said P-uniform if there exists a determin- 
istic Turing machine which, on input (n, i) given in binary, outputs the 
i-ih bit of the encoding of C„ in time polynomial in n (that is, in time 
exponential in the size of the input). Similarly, a family of circuits (C„) 
is said DLOGTIME-uniform if there exists a deterministic Turing machine 
which, on input (ra, i) given in binary, outputs the i-th. bit of the encoding 
of Cn in time logarithmic in n (that is, in time linear in the size of the 
input). Of course, DLOGTIME-uniformity implies P-uniformity. It can be 
argued that DLOGTIME-uniformity is the right notion of uniformity for 
small-depth circuits, see [7]. 

Remark 1. In the remainder of the paper, we shall work with DLOGTIME- 
uniformity, but everything remains valid if replaced by "polylogtime" 
uniformity. 

2.5 Complexity classes 

Finally, we will meet some complexity classes defined now. Let 
DTIME(t(n)) denote the set of languages recognized in time t{n) by 
a deterministic Turing machine. Then P is the class DTIME(n*^^^^) = 
Ufc>oDTIME(n'^) (that is, deterministic polynomial time) and E is the 
class DTIME(2<^(")) = Ujk>oDTIME(2'='*) (that is, deterministic exponen- 
tial time with linear exponent). 

Recall the time hierarchy theorem [4]: for time-constructible func- 
tions / and g, if f(n)/g(n) = o{l/\og{g{n))) then DTIME(c/(n)) (;t 
DTIME(/(n)). In particular, we will use the following consequence: E ^ 
DTIME(n2°"°^"Y 

The class flP is the set of functions / : {0, 1}* — > N defined as follows: 
there exist a polynomial p and a language A E P such that f{x) = #{y € 
{0, : {x,y) G A}. Computing the permanent of a 0-1 matrix is jJP- 
complete (Valiant [14]). Then PP is the set of languages B such that there 
is / G ttP satisfying [x e B f{x) > 2^(1^1)-!]. The class PP can also 

be viewed as the languages B such that there exists a polynomial-time 
nondeterministic Turing machine N satisfying [a:; G i? iff at least half of 
the computation paths of A'^ are accepting]. Remark that if every function 
in ttP can be computed in polynomial time, then PP = P. 

Complexity classes can also be defined in terms of circuits (either 
Boolean or threshold) . An input x is accepted by a circuit C if the value 
of C on X, denoted by C{x), is 1. In order to recognize languages, families 
{Cn) of circuits are considered: circuit C„ will recognize inputs of size 
n, hence we make the assumption that C„ has n input gates. Now, a 
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language A is recognized by a family (C^) of circuits if A = {x €: {0, 1}* : 

q,i(x) = i}. 

We shall use the well known characterization of P in terms of circuits: 
P is the set of languages recognized by P-uniform families of polynomial- 
size Boolean circuits. The class AC° is the set of languages recognized 
by a family of constant-depth Boolean circuits of polynomial size, where 
the gates V and A have unbounded fan-in. The class TC° is the set of 
languages recognized by a family of constant-depth threshold circuits 
of polynomial size. Uniform versions of these classes, DLOGTIME-AC'^ 
and DLOGTIME-TC° respectively, are defined by requiring DLOGTIME- 
uniformity on the circuit family. 

3 Technical developments 

This series of results is devoted to the proof of Theorem 1. 

Lemma 1. // the permanent has P-uniform polynomial- size threshold 
circuits then PP=P. 

Proof. First turn the threshold circuits into Boolean circuits. To this end, 
every A or V gate of unbounded fan-in is replaced by trees of A or V 
gates of fan-in 2 (which clearly remains P-uniform and of polynomial 
size), and every threshold gate with N = n'~'^^^ inputs is replaced by the 
addition of the inputs followed by a comparison of the result with N/2. 
This iterative addition can easily be carried out by a P-uniform circuit 
of size polynomial in N, hence polynomial in n. This proves that the 
permanent has P-uniform polynomial-size Boolean circuits. 

Thus, by ttP-completeness of the permanent every function in t|P can 
be computed in polynomial time. This implies that PP = P. □ 

As a preparation to the proof of Lemma 3, let us first rephrase the hy- 
pothesis PP = P in a convenient way. 

Lemma 2. Let A be a language with a (deterministic) algorithm running 
in time t{n) > n. Consider the following problem B: given a word x, a 
length n and an integer N < 2^, decide whether at least N words y of 

size n satifsy {x,y) G A. 

// PP = P then B has an algorithm running in time p{t{n)) for a fixed 
polynomial p (independent of A). 

Proof. Remark that this is not a completely obvious consequence 
of PP = P since the polynomial p is required to be independent of 
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A. In fact this comes from the existence of a complete problem for 
PP. Take indeed the canonical PP-complete language H = {{M,x, 1"^) : 
at least half of the computation paths of M{x) are accepting in time n}, 
where AI is a nondcterministic Turing machine. The hypothesis PP = P 
implies that H is decidable in time p{n) for some polynomial p. 

To the problem B is associated a language B = {(.t, n, A^, 1*^")) : 
#{y G {0,1}" : {x,y) e A} > N}. Then B is in PP and a reduction 
from B to H is the mapping {x, n, N, (^jVf, (n, A^, x), 1*^")), where 

M{n,N,x) has the following behaviour: it guesses a bit 6 € {0,1}; if 
6 = then it creates 2" — AT accepting paths among 2" paths; if 6 = 1 
then it guesses y G {0, 1}" and decides whether (x, y) G ^ by running the 
algorithm for A in time t{n). Therefore M{n, N,x) runs in time 0(t(n)), 
has 2"+^ paths, and among them #{y € {0, 1}" : {x,y) e A} + (2'' - N) 
are accepting. This is at least half iff #{y € {0, 1}" : {x,y) e A} > N. 
This reduction shows that B is decidable in time p{t{n)). □ 

Similarly as succinct representations used for exponential-time-complete 

languages, threshold circuits can be succinctly given, not by their binary 
encoding but rather by a description of their gates. That is, instead of 
giving the threshold circuit C directly, a Boolean circuit B is given, whose 
value B{i) on input i is the i-th bit of the encoding of C. This may enable 
to give a much shorter representation of the circuit. Circuits given in that 
way will be called "succinctly given". 

Lemma 3. Let A be the problem of deciding the value of a succinctly 

given threshold circuit, that is, 

A = {{B,x) : B represents a threshold circuit C and C{x) = 1} 

where B is a Boolean circuit and x is a Boolean input to C of appropriate 
size. The size of the threshold circuit C is denoted by s and its depth by d. 
Suppose furthermore that the size of the input {B,x) is less than (logs)^*^. 
If PP = P, then A has an algorithm of running time (logs)^°^'^\ 

Proof. The idea is to recursively evaluate the values of the gates at each 
depth of the circuit, using Lemma 2 for threshold gates. In order to apply 
Lemma 2, one has to consider all the inputs of a particular gate, leading 
us to define the language Ak corresponding to the gates being inputs of 
the i-th gate of C, whose depth is < A;, as follows: 

Ak = {{B,x,i,j) : B represents a threshold circuit C in which 
gate number i is at depth < k, 
gate number j is an input of gate i, and 
the value of gate j in the computation C{x) is 1} 
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Note that one can artificially add to C a final "identity gate" taking as 
input the output of C, in which case deciding A^+i implies computing 
the value of C{x). 

Let us call T{k, d, s) the time needed to decide as a function of the 
size s and the depth d of C. The language A2 merely consists in evaluating 
an input gate, that is, deciding to which bit of x it corresponds: this can 
be done in polynomial time, hence in time (log s)'-^^^'*) by assumption on 
the size of {B,x). Therefore T{2, d, s) = {logs)^^^''\ 

The purpose is now to decide ^fc+i by using the algorithm for A^. It 
can be done easily since we can decide the value of a gate at depth k if 
we know the values of the gates at depth < A; — 1. Indeed, let us decide 
whether (B,x,i,j) G ^fc+i, supposing gate i is at depth /c + 1 and has 
gate j as input: we want to compute the value of gate j. Since gate j is 
at depth < k, the algorithm for Ak provides the value of all the inputs of 
gate j, which are used in turn to compute the value of gate j itself. More 
precisely we proceed inductively: 

— If gate J is a -1 gate, that is, / = -ig, then the value of / is the negation 

of the value of g. 

— If gate j is an V or an A gate, that is, f = goh with o G {V, A}, then 
we perform the corresponding Boolean operation on the values of g 
and h. 

— Finally, if gate j is a threshold gate, it has at most s — 1 inputs and 
we decide whether at least half of them evaluate to 1. 

Let us bound the execution time T{k + 1, s, d) of this algorithm for A^+i 
as a function of T(/c, s, d) (the execution time of the algorithm for A^). In 
the first case, we take the negation of one request of the form (S, x, j, g) G 
^fe, therefore we have the following relation: T{k + 1, s, d) = T{k, s, d) + 
0(1). Similarly, in the second case we make a Boolean combination of two 
requests (one for each input), hence T{k + < 2T{k,s,d) + 0(1). 

Finally in the third case, the task is to decide whether more than half of 
the inputs y of gate j evaluate to 1. Applying Lemma 2 to the language 
Ak with requests of the form {B,x,j,y) G A^ for all gates y input of j, 
yields T{k + 1, s, d) < p{T{k, s, d)) for some fixed polynomial p. 

As a whole, we have the following relation, for a fixed polynomial p: 

T{k + l,s,d) <p{T{k,s,d)). 

In other words, there exists an exponent a € N such that T{k + 1, s, d) < 
T{k,s,dY, hence T{k,s,d) < T(2,s,d)"\ Since T{2,s,d) = (logs)^^^'*) 
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and deciding A requires to go up to = d + 1, there is an algorithm for 
A running in time 



Tid + l,s,d) = {log sf°^'\ 



□ 



Lemma 3 concerns the evaluation of succinctly given threshold circuits; 
the consequence for languages is the following. 

Corollary 1. Suppose a language A has threshold circuits of size s{n), 
depth d{n) and constructihle in polynomial time (that is, the i-th hit ofCn 
is computable in time n^'^^^). Suppose furthermore that (log s(n))^°^'*'"^^ 
is superpolynomial in n. 

If PP = P, then A has an algorithm of running time (log s(n))^°''*'"^\ 

Let us now sec how to relate the hypothesis on the permanent to decision 
languages. Wc need the following result concerning the completeness of 
the permanent under a very strong notion of reduction. This result ap- 
pears in [2] as a careful analysis of the usual reduction of Valiant [15] (see 
also [16] for many-one reductions), which can in fact be carried out in a 
much more efficient way than just polynomial time. 

Proposition 1. The permanent of 0-1 matrices is hard for jjP under 
DLOGTIME-um/orm AC*^ many-one reductions, that is, the reduction is 
computed by DLOGTIME-uni/orm AC° circuits. 

Corollary 2. Every language A E P can be expressed as the perma- 
nent of a 0-1 matrix M of size n^^^\ computed by DLOGTIME-um/oTTO 

AC° circuits. More precisely, there are functions M and a computed by 
DLOGTIME-mi/orm AC° circuits such that x e A^ a{PER{M{x))) = 1 
andx^A^ a(PER(M(x))) = 0. 

Scaling up this result to exponential time yields the following corollary. 

Corollary 3. For every language A e E, there are two functions M and 
a computable by size 2*^^"), constant- depth Boolean circuits constructible 
in polynomial tim,e (thai is, the i-th hit of the circuit is com,putahle in 
time n^^^^), such that x E A ^ a(PER(M(a;))) = 1 and x ^ A ^ 
a(PER(M(x))) = 0. 

This implies the following result. 

Corollary 4. If the permanent has DLOGTIME-wTO/orm polynomial- size 
threshold circuits of depth d{n), then every language A in E has thresh- 
old circuits of size 2^^^^ and depth 0{d{2^^"'^)), these circuits being con- 
structihle in polynomial time (that is, the i-th hit of Cn is computable in 
time n'^(^) ). 
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Proof. By Corollary 3, membership to A is decided by the permanent of 
a matrix M{x) of size 2^^'^\ It is enough to compute the matrix M(x) by 
constant-depth uniform circuits, then to plug the result into the uniform 
threshold circuits of depth d{2'~'^"'^) for the permanent of matrices of size 
2*^^"), and finally to apply function a computed by constant-depth uni- 
form circuits. The remaining circuits are again uniform threshold circuits 
of depth 0(d(20W)). □ 

Combining Corollary 4, Lemma 1 and Corollary 1 yields the following. 

Corollary 5. If the permanent has DLOGT\ME-uniform polynomial-size 
threshold circuits of depth d{n), then E C DTIME(n ). 

This is in contradiction with the time hierarchy theorem as soon as d{n) = 
o(loglogn), hence we have proved our main result: 

Corollary 6 (Theorem 1). The permanent does not have DLOGTIME- 
uniform polynomial- size threshold circuits of depth o(loglogn). 

Since an arithmetic circuit can be simulated by a threshold one (addition 
and multiplication are indeed in DLOGTIME-uniform TC°), we obtain the 
following corollary. 

Corollary 7. The permanent does not have DLOGTIME-um/oTrn 

polynomial- size arithmetic circuits of depth o(loglogn). 

Acknowledgments — The authors want to thank Eric Allender for useful 
discussions. 
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